Compressed Dynamic Tries with Applications to LZ-Compression in Sublinear Time and Space

نویسندگان

  • Jesper Jansson
  • Kunihiko Sadakane
  • Wing-Kin Sung
چکیده

The dynamic trie is a fundamental data structure which finds applications in many areas. This paper proposes a compressed version of the dynamic trie data structure. Our data-structure is not only space efficient, it also allows pattern searching in o(|P |) time and leaf insertion/deletion in o(log n) time, where |P | is the length of the pattern and n is the size of the trie. To demonstrate the usefulness of the new data structure, we apply it to the LZ-compression problem. For a string S of length s over an alphabet A of size σ, the previously best known algorithms for computing the Ziv-Lempel encoding (lz78) of S either run in: (1) O(s) time and O(s log s) bits working space; or (2) O(sσ) time and O(sHk + s log σ/ logσ s) bits working space, where Hk is the korder entropy of the text. No previous algorithm runs in sublinear time. Our new data structure implies a LZ-compression algorithm which runs in sublinear time and uses optimal working space. More precisely, the LZ-compression algorithm uses O(s(log σ+log logσ s)/ logσ s) bits working space and runs inO(s(log log s)/(logσ s log log log s)) worst-case time, which is sublinear when σ = 2 o(log s log log log s (log log s)2 ) .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automata on Lempel-ziv Compressed Strings

Using the Lempel-Ziv-78 compression algorithm to compress a string yields a dictionary of substrings, i.e. an edge-labelled tree with an order-compatible enumeration, here called an LZ-trie. Queries about strings translate to queries about LZ-tries and hence can in principle be answered without decompression. We compare notions of automata accepting LZ-tries and consider the relation between ac...

متن کامل

Ziv Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays

We present a very efficient, in terms of space and access speed, data structure for storing huge natural language data sets. The structure is described as LZ (Ziv Lempel) compressed linked list trie and is a step further beyond directed acyclic word graph in automata compression. We are using the structure to store DELAF, a huge French lexicon with syntactical, grammatical and lexical informati...

متن کامل

Tree-Combined Trie: A Compressed Data Structure for Fast Ip Address Lookup

For meeting the requirements of the high-speed Internet and satisfying the Internet users, building fast routers with high-speed IP address lookup engine is inevitable. Regarding the unpredictable variations occurred in the forwarding information during the time and space, the IP lookup algorithm should be able to customize itself with temporal and spatial conditions. This paper proposes a new ...

متن کامل

Implementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey

Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...

متن کامل

Dynamic index, LZ factorization, and LCE queries in compressed space

In this paper, we present the following results: (1) We propose a new dynamic compressed index of O(w) space, that supports searching for a pattern P in the current text in O(|P | logw+logw log |P | logN(log M)+occ logN) time and insertion/deletion of a substring of length y in O((y + logN log M) logw logN log M) time, where N is the length of the current text, M is the maximum length of the dy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007